Using Deepchecks Vision With a Few Lines of Code#

Deepchecks Vision is built to validate your data and model, however complex your model and data may be. That being said, sometime there is no need to write a full-blown ClassificationData or DetectionData. In the case of a simple classification task, there is quite a few checks that can be run writing only a few lines of code. In this tutorial, we will show you how to run all checks that do not require a model on a simple classification task.

This is ideal, for example, when receiving a new dataset for a classification task. Running these checks on the dataset before even starting with training will give you a quick idea of how the dataset looks like and what potential issues it contains.

Defining the data and model#

The data is available from the torch library. We will download and extract it to the current directory.

import urllib.request
import zipfile
import os

url = 'https://download.pytorch.org/tutorial/hymenoptera_data.zip'
urllib.request.urlretrieve(url, 'hymenoptera_data.zip')

with zipfile.ZipFile('hymenoptera_data.zip', 'r') as zip_ref:
    zip_ref.extractall('.')

# Rename val folder to test, because the simple classification task expects a test folder.
if not os.path.exists('hymenoptera_data/test'):
    os.rename('hymenoptera_data/val', 'hymenoptera_data/test')

Loading a Simple Classification Dataset#

A simple classification dataset is an image dataset structured in the following way:

  • root/
    • train/
      • class1/

        image1.jpeg

    • test/
      • class1/

        image1.jpeg

from deepchecks.vision.simple_classification_data import load_dataset

train_ds = load_dataset('hymenoptera_data', train=True, object_type='VisionData', image_extension='jpg')
test_ds = load_dataset('hymenoptera_data', train=False, object_type='VisionData', image_extension='jpg')

# Running Deepchecks' full suite
# ==============================
# That's it, we have just defined the classification data object and are ready to run the train_test_validation suite:

from deepchecks.vision.suites import train_test_validation

suite = train_test_validation()
result = suite.run(train_ds, test_ds)

Out:

Validating Input:   0%| | 0/1 [00:00<?, ? /s]

Ingesting Batches - Train Dataset:   0%|        | 0/8 [00:00<?, ? Batch/s]

Ingesting Batches - Train Dataset:   0%|        | 0/8 [00:00<?, ? Batch/s, Batch=0%]

Ingesting Batches - Train Dataset:  12%|#       | 1/8 [00:02<00:17,  2.44s/ Batch, Batch=0%]

Ingesting Batches - Train Dataset:  12%|#       | 1/8 [00:02<00:17,  2.44s/ Batch, Batch=12%]

Ingesting Batches - Train Dataset:  25%|##      | 2/8 [00:06<00:20,  3.43s/ Batch, Batch=12%]

Ingesting Batches - Train Dataset:  25%|##      | 2/8 [00:06<00:20,  3.43s/ Batch, Batch=25%]

Ingesting Batches - Train Dataset:  38%|###     | 3/8 [00:09<00:15,  3.07s/ Batch, Batch=25%]

Ingesting Batches - Train Dataset:  38%|###     | 3/8 [00:09<00:15,  3.07s/ Batch, Batch=38%]

Ingesting Batches - Train Dataset:  50%|####    | 4/8 [00:11<00:11,  2.84s/ Batch, Batch=38%]

Ingesting Batches - Train Dataset:  50%|####    | 4/8 [00:11<00:11,  2.84s/ Batch, Batch=50%]

Ingesting Batches - Train Dataset:  62%|#####   | 5/8 [00:14<00:08,  2.74s/ Batch, Batch=50%]

Ingesting Batches - Train Dataset:  62%|#####   | 5/8 [00:14<00:08,  2.74s/ Batch, Batch=62%]

Ingesting Batches - Train Dataset:  75%|######  | 6/8 [00:16<00:05,  2.72s/ Batch, Batch=62%]

Ingesting Batches - Train Dataset:  75%|######  | 6/8 [00:17<00:05,  2.72s/ Batch, Batch=75%]

Ingesting Batches - Train Dataset:  88%|####### | 7/8 [00:19<00:02,  2.68s/ Batch, Batch=75%]

Ingesting Batches - Train Dataset:  88%|####### | 7/8 [00:19<00:02,  2.68s/ Batch, Batch=88%]

Ingesting Batches - Train Dataset: 100%|########| 8/8 [00:21<00:00,  2.33s/ Batch, Batch=88%]


Ingesting Batches - Test Dataset:   0%|     | 0/5 [00:00<?, ? Batch/s]


Ingesting Batches - Test Dataset:   0%|     | 0/5 [00:00<?, ? Batch/s, Batch=0%]


Ingesting Batches - Test Dataset:  20%|#    | 1/5 [00:02<00:11,  2.81s/ Batch, Batch=0%]


Ingesting Batches - Test Dataset:  20%|#    | 1/5 [00:02<00:11,  2.81s/ Batch, Batch=20%]


Ingesting Batches - Test Dataset:  40%|##   | 2/5 [00:05<00:08,  2.76s/ Batch, Batch=20%]


Ingesting Batches - Test Dataset:  40%|##   | 2/5 [00:05<00:08,  2.76s/ Batch, Batch=40%]


Ingesting Batches - Test Dataset:  60%|###  | 3/5 [00:08<00:05,  2.75s/ Batch, Batch=40%]


Ingesting Batches - Test Dataset:  60%|###  | 3/5 [00:08<00:05,  2.75s/ Batch, Batch=60%]


Ingesting Batches - Test Dataset:  80%|#### | 4/5 [00:14<00:03,  3.94s/ Batch, Batch=60%]


Ingesting Batches - Test Dataset:  80%|#### | 4/5 [00:14<00:03,  3.94s/ Batch, Batch=80%]


Ingesting Batches - Test Dataset: 100%|#####| 5/5 [00:15<00:00,  3.22s/ Batch, Batch=80%]



Computing Checks:   0%|      | 0/6 [00:00<?, ? Check/s]



Computing Checks:   0%|      | 0/6 [00:00<?, ? Check/s, Check=Heatmap Comparison]



Computing Checks:  17%|#     | 1/6 [00:00<00:00, 18.64 Check/s, Check=Train Test Label Drift]



Computing Checks:  33%|##    | 2/6 [00:00<00:00, 20.79 Check/s, Check=Train Test Prediction Drift]



Computing Checks:  50%|###   | 3/6 [00:00<00:00, 31.09 Check/s, Check=Image Property Drift]



Computing Checks:  67%|####  | 4/6 [00:00<00:00,  9.72 Check/s, Check=Image Property Drift]



Computing Checks:  67%|####  | 4/6 [00:00<00:00,  9.72 Check/s, Check=Image Dataset Drift]



Computing Checks:  83%|##### | 5/6 [00:00<00:00,  9.09 Check/s, Check=Image Dataset Drift]



Computing Checks:  83%|##### | 5/6 [00:00<00:00,  9.09 Check/s, Check=Simple Feature Contribution]



Computing Checks: 100%|######| 6/6 [00:00<00:00,  5.94 Check/s, Check=Simple Feature Contribution]

Observing the results:#

The results can be saved as a html file with the following code:

result.save_as_html('output.html')

Or, if working inside a notebook, the output can be displayed directly by simply printing the result object:

result
Suite Output


Total running time of the script: ( 0 minutes 41.641 seconds)

Gallery generated by Sphinx-Gallery